Empirical Analysis of Policy Gradient Algorithms where Starting States are Sampled accordingly to Most Frequently Visited States

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Signal-to-Noise Ratio Analysis of Policy Gradient Algorithms

Policy gradient (PG) reinforcement learning algorithms have strong (local) convergence guarantees, but their learning performance is typically limited by a large variance in the estimate of the gradient. In this paper, we formulate the variance reduction problem by describing a signal-to-noise ratio (SNR) for policy gradient algorithms, and evaluate this SNR carefully for the popular Weight Per...

متن کامل

Most energetic passive states.

Passive states are defined as those states that do not allow for work extraction in a cyclic (unitary) process. Within the set of passive states, thermal states are the most stable ones: they maximize the entropy for a given energy, and similarly they minimize the energy for a given entropy. Here we find the passive states lying in the other extreme, i.e., those that maximize the energy for a g...

متن کامل

Where are the states of a black hole ? 1

We argue that bound states of branes have a size that is of the same order as the horizon radius of the corresponding black hole. Thus the interior of a black hole is not ‘empty space with a central singularity’, and Hawking radiation can pick up information from the degrees of freedom of the hole. Talk given at ‘Quantum Theory and Symmetries’, Cincinnati, September 2003.

متن کامل

Bayesian Policy Gradient Algorithms

Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate this gradient. Since Monte Carlo methods tend to have high variance, a large number of samples is required, resulting in slow convergence. In this paper, we propose a Bayesian fra...

متن کامل

Comparing Policy-Gradient Algorithms

We present a series of formal and empirical results comparing the efficiency of various policy-gradient methods—methods for reinforcement learning that directly update a parameterized policy according to an approximation of the gradient of performance with respect to the policy parameter. Such methods have recently become of interest as an alternative to value-function-based methods because of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IFAC-PapersOnLine

سال: 2020

ISSN: 2405-8963

DOI: 10.1016/j.ifacol.2020.12.2279